Extracting Synchronous Grammar Rules From Word-Level Alignments in Linear Time
نویسندگان
چکیده
We generalize Uno and Yagiura’s algorithm for finding all common intervals of two permutations to the setting of two sequences with many-to-many alignment links across the two sides. We show how to maximally decompose a word-aligned sentence pair in linear time, which can be used to generate all possible phrase pairs or a Synchronous Context-Free Grammar (SCFG) with the simplest rules possible. We also use the algorithm to precisely analyze the maximum SCFG rule length needed to cover hand-aligned data from various language pairs.
منابع مشابه
Unsupervised Discriminative Induction of Synchronous Grammar for Machine Translation
We present a global log-linear model for synchronous grammar induction, which is capable of incorporating arbitrary features. The parameters in the model are trained in an unsupervised fashion from parallel sentences without word alignments. To make parameter training tractable, we also propose a novel and efficient cube pruning based synchronous parsing algorithm. Using learned synchronous gra...
متن کاملBayesian Extraction of Minimal SCFG Rules for Hierarchical Phrase-based Translation
We present a novel approach for extracting a minimal synchronous context-free grammar (SCFG) for Hiero-style statistical machine translation using a non-parametric Bayesian framework. Our approach is designed to extract rules that are licensed by the word alignments and heuristically extracted phrase pairs. Our Bayesian model limits the number of SCFG rules extracted, by sampling from the space...
متن کاملImprovements in Hierarchical Phrase-based Statistical Machine Translation
Hierarchical phrase-based translation (Hiero) is a statistical machine translation (SMT) model that encodes translation as a synchronous context-free grammar derivation between source and target language strings (Chiang, 2005; Chiang, 2007). Hiero models are more powerful than phrase-based models in capturing complex source-target reordering as well as discontiguous phrases, while being easier ...
متن کاملEnriching SCFG rules directly from efficient bilingual chart parsing
In this paper, we propose a new method for training translation rules for a Synchronous Context-free Grammar. A bilingual chart parser is used to generate the parse forest, and EM algorithm to estimate expected counts for each rule of the ruleset. Additional rules are constructed as combinations of reliable rules occurring in the parse forest. The new method of proposing additional translation ...
متن کاملA Formal Characterization of Parsing Word Alignments by Synchronous Grammars with Empirical Evidence to the ITG Hypothesis
Deciding whether a synchronous grammar formalism generates a given word alignment (the alignment coverage problem) depends on finding an adequate instance grammar and then using it to parse the word alignment. But what does it mean to parse a word alignment by a synchronous grammar? This is formally undefined until we define an unambiguous mapping between grammatical derivations and word-level ...
متن کامل